Bridging the gap from darkness to solar brilliance

A UN Datathon Story

Janith Wanniarachchi

EBS, Monash

David Wu

EBS, Monash

Sundance Sun

Education, Melbourne

James Hogg

Maths, QUT

Farhan Ameen

Maths & Stats, USyd

February 29, 2024

The official bit

The Datathon

The project brief

  • Create a data solution

  • that tackles one or more of the 17 sustainable development goals

  • by leveraging one of the six key transitions

    • food systems
    • energy access and affordability
    • digital connectivity
    • education
    • jobs and security
    • climate change, biodiversity loss, and pollution
  • and focuses on the SDG localisation enabler

    • place local communities at the centre of development responses
    • enable local advocacy, local action, and local monitoring and reporting

Our project

Aim:

  • At the beginning: Solve world problems
  • Towards the middle: Slap together a half-baked solution in 3 days.

Problem:

Globally, nearly a billion people lack reliable energy sources, and solar is a cost-effective way for this demand to be fulfilled.

Solution:

Map areas of the globe that solar farm investment would be successful in, by using existing solar farms as training data; overlay that onto a map of energy demand, proxied by night light data.

Data Sources

Quantity Source Provided/Extracted Format
Population density Google Earth Engine, provided by Oak Ridge National Laboratory tiff
Night light intensity NASA, Earth at Night project tiff
Biomass/land use NASA tiff
Terrain slope Google Earth Engine, provided by USGS tiff
Photovoltaic potential Global Solar Atlas tiff
Solar farm locations S. Dunnett, hosted on awesome-gee-community-catalog and figshare csv

Concordance

Data was all remapped from their raw forms onto a consistent grid.

rasterGrid = raster(ncols = 3600, nrows = 1800,
                    xmn = -180, xmx = 180,
                    ymn = -90, ymx = 90)
baseRaster = terra::rast(rasterGrid)

rawValues = terra::rast(tiffFile)
consistentValues = resample(rawValues, baseRaster, method = "bilinear")

valueDataFrame = as.data.frame(consistentValues, xy = TRUE, na.rm = FALSE) %>% 
    mutate(id = 1:ncell(consistentValues))

Rough Model Details: Power Ratio

Regress per-area power production of existing solar farm locations on a laughably small number of factors (photovoltaic potential, land use, terrain slope).

Using “spatial” “random forest”.

library(caret)

form = power_ratio ~ biomass + slope + photovoltaic_potential + lat + lon

caret::train(
  form,
  method = "ranger",
  ...
)

Rough Model Details: Energy Demand

Demand was modelled using a proxy quantity constructed from night light intensity and population density

import polars as pl

(regressors
  .with_columns(
    log_pop_density = (255 - pl.col('density') + 1).log10(),
    log_nightlight = (pl.col('nightlight') + 1).log10(),
  )
  .with_columns(
    demand = -(pl.col('log_pop_density') + pl.col('log_nightlight')) + (np.log10(256))
  )
  .select('x', 'y', 'demand')
)

Shiny App

The experience

Day 1

So none of us had much experience with spatial data.

UN provided “data sources” - but it was just a shotgun list of other lists

Most of it was spent collecting and sourcing data.

Initial focus was on Africa, but we couldn’t find nice shape files or very local data for the region.

Limitations:

  • We wanted to find spatial data at a resolution that was better than at country level, and had data for the entire globe.

  • Work was done on an AWS EC2 VM instance that had RStudio Server installed.

    • AWS has an image with RStudio Server Pro, but you need to add the AMI, which we couldn’t do.
    • We chose the wrong instance image (Amazon Linux) and spent a day building dependencies from source.
    • We did this again when we resized the attached EBS storage.

The madness continues…

Day 2

  • Data was concorded onto a regular grid for analysis.
    • The focus of our solution was changed to the entire world.
    • The grid size was set to 3600 x 1800 (6,480,000 cells!)
    • We don’t talk about how long I spent on this in Python that was refactored into a single call in R . . .
  • We used Facebook/Amazon data source initial for population data: gave up cause the data was too big and broken up into tiles, which then didn’t match up after reconstruction.
    • I spent way too much time working on loading the data in R and stitching it together only to abandon it at the end.
  • Moved to the Google Earth Engine API (available in Python and Javascript)
    • data exports to Google Drive, but no progress indicator
    • we had previously used this to find terrain data
  • The data was consolidated and models were built
    • So I decided to make a Shiny app to showcase our final solution
    • Used the rhino framework from Appsilon to speed up the process
    • Learned how to showcase 6 million points in leaflet
      • Had to reduced the resolution to a manageable level
      • Used rectangles to showcase areas, instead of individual dots

End of day:

  • A simple linear regression model was built as a baseline model
  • Models were trained and validated overnight
  • Shiny app was built up to a rough POC

Day 3

  • Model was being iterated on Day 3
  • R Shiny app was being developed further with interactivity and final UI touch-ups
  • Sherry became a doctor
  • Submitted our state-of-the-art solution at 6.30pm
  • People-watched until 9 p.m.
  • Everywhere good for food was closed or in the process of closing
  • First stop was Riverland Brisbane and then to Felons Barrel
    • Fun fact: The smallest measurement unit in glasses is a pony

The learnings

Learnings

  • Impress people with fancy graphics.

    • Use AI 👻 to generate images
  • Communicating the trash you have assembled is (more) important (than the quality of trash you collect)

  • R and Python can work together

  • Spatial data is a pain in the ass to work with

Acknowledgements

Money

  • QUT Centre for Data Science
  • ADSN
  • George

Teammates

  • Farhan
  • Sundance
  • Jamie

Admin

  • Tim Macuga
  • Michael Lydeamore